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with a system that will permit the collection 
of valid, reliable, relevant and useful informa¬ 
tion about the performance of the students. 
This information should be analysed in a 
manner that will bring improvement to the 
current program” (M. Raupp, unpublished 
document). 


This article will discuss the procedures the 
CLC undertook to make a positive change 
to the assessment system. Other teachers 
and administrators may find that reviewing 
these procedures will be a useful guide to 
create more meaningful ways to assess their 
students. 


Table 1: Past and Present Assessment Procedures at the Language Center 


1990-2000 

Traditional Paper-and-Pencil Test Items 

2000-Present 

New Performance-Based Test Tasks 

1. Fill in the Blanks. Complete the five sentences below 
with the correct personal pronoun or possessive adjec¬ 
tive (choose from the following: he/she/we/ they/it/ 
his/ her/our/their). 

a. John is American. last name is Stevens. 

b. Lisa and I are in Room 10. room is verv nice. 

c. Robert and David are brothers. are from Cali- 

fornia. 

d. The kevs are in Linda’s bookbag. bookbag is 

brown. 

e. Joseph and Kate’s phone number is 555-6608. 

are brother and sister. 

1. ORAL PERFORMANCE 

SKILL: Speaking 

LEVEL: Beginner 

TASK: You have five minutes to prepare a brief presenta¬ 
tion about yourself. In one to two minutes, state in com¬ 
plete sentences: 

a) Your full name and the way you spell your last 
name 

b) Your age and phone number 

c) Where you and your family are from 

Students can also be given an interesting topic and time to 
prepare and then be interviewed or asked to make an oral 
presentation (one minute or less for beginners and longer 
for intermediate and advanced students). 

1. Multiple Choice Listening Exercise. Listen to people 
talking and check (/) the correct information. 

a) The woman is short and in her thirties. 

The woman is medium height and in her twenties. 

The woman is fairly short and about twenty-five. 

b) The man had a great vacation in Paris last year in July. 
The man hasn’t been to Paris, France yet. 

The man can’t wait to go to Paris in August. 

c) You shouldn’t go to Las Ramblas because that’s a very 
long street. 

You shouldn’t miss some of the wonderful museums in 
Barcelona. 

You should visit Spain in January. 

2. LISTENING TASK 

SKILL: Listening 

LEVEL: Intermediate 

TASK: Interesting and relevant audio- and video-taped 
material followed by open-ended questions and/or mul¬ 
tiple-choice items. 

The teacher selects a familiar topic to conduct a partial dic¬ 
tation (certain omitted words are filled in by the students as 
the teacher reads) and a graduated dictation (students write 
down the dictation as the teacher reads progressively longer 
sentences). 

(Adapted from Bailey 1998, 15-18) 

3. Complete the Dialogue. Write the questions for the 
following answers: 

a) ? 

Yes I do. I play volleyball. 

b) ? 

I play volleyball very well. 

c) ? 

I usually spend about two hours a day. 

d) ? 

Yes, Leila and Virna are pretty good at volleyball. 

e) ? 

Well, I have two sisters and one brother. 

0 ? 

No, we didn’t. We stayed home and relaxed. 

3. WRITING TASK 

SKILL: Writing 

LEVEL: Intermediate 

TASK: A Hypothetical Interview. What famous person 
would you like to interview? Why? In two paragraphs 
prepare an interview plan. In the first paragraph mention 
who you would like to interview and why. In the second 
paragraph, prepare five questions you would like to ask this 
person (things you think other people would like to know). 

4. INTEGRATED SKILLS TASK 

SKILL: Reading and Writing 

LEVEL: Advanced 

TASK: After reading a job announcement, write a business 
letter requesting an application and then fill out the appli¬ 
cation using the attached form. 

(Adapted from Bailey 1998, 209-212) 
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Dissatisfaction with traditional 
assessment 

The CLC’s desire to transform its assess¬ 
ment procedures grew out of dissatisfaction 
with the following aspects of traditional 
assessment: 

• Teachers were encouraged to utilize 
communicative approaches but assessed 
their students with traditional paper- 
and-pencil tests. 

• Teachers observed that traditional 
assessment was not reflecting students’ 
actual potentials. 

• Many traditional test items had poor 
content validity , which means the tests 
did not adequately measure the lan¬ 
guage skills that were being taught in 
the classroom (Popham 1981; Davies 
1990; Heaton 1990). 

• Different teachers who rated the same 
student compositions did not have 
an agreed upon judging process and 
therefore gave significantly different 
scores, resulting in highly subjective 
assessment and low interrater reliability 
(McNamara 1996; Gamaroff 2000). 

• Teachers were either not provided with 
or had not developed level descriptors , 
which are concise statements describing 
the character of a minimally acceptable 
performance of an oral or written pre¬ 
sentation (McNamara 2000). (See the 
Appendix for examples of level descrip¬ 
tors for an oral presentation.) 

• Many traditional test items did not 
have construct validity , which means 
that they were not grounded on the 
theory of language acquisition that 
informs the communicative approach 
or the communicative methods being 
applied in the classroom (Popham 
1981; McNamara 2000). 

• The traditional testing produced nega¬ 
tive instead of positive washback (also 
known as backwash ), which is “the 
impact of tests on the teaching pro¬ 
gramme” (McNamara 1996, 23). In 
other words, traditional tests became 
the main focus of language instruc¬ 
tion and did not contribute to student 
learning in a positive way. 


Initiation of performance-based 
assessment 

At the beginning of the transition to 
performance-based testing, the specialist pro¬ 
vided the Assessment Project participants 
with literature pertinent to testing and mea¬ 
surement (see Davies 1990; Bailey 1998; 
and McNamara 2000). This research served 
two main purposes: (1) it provided an initial 
frame of reference on testing and measure¬ 
ment, and (2) it became a guide throughout 
the process of elaboration, administration, 
and refinement of the new performance- 
based testing program. 

During the initiation of the new perfor¬ 
mance-based program, the specialist present¬ 
ed an essential overview of certain concepts 
fundamental to language assessment, includ¬ 
ing practicality, validity, and reliability. 

Practicality 

A quality assessment program typically 
requires the allocation of many resources, 
including materials, funds to hire outside 
experts, and the time of administrators and 
teachers. Therefore, an educational institu¬ 
tion must be practical as it determines how 
to best dedicate the available resources while 
developing valid and reliable tests that pro¬ 
mote positive washback (Bailey 1998). 

Validity and reliability 

All assessment programs must consider 
the validity and reliability of the testing 
instruments under creation. According to 
Popham (1981), validity is obtained if it can 
be demonstrated that the testing instrument 
is appropriate for the skills that one wants to 
measure, and reliability occurs when the test¬ 
ing instrument yields consistent results over 
repeated administrations. 

Valid tests have a clear and demonstra¬ 
ble relationship with the actual skills being 
assessed, and the developers must follow pre¬ 
cise guidelines to ensure this relationship. Data 
is collected at every stage of test development 
to document validity, which is also measured 
by the statistical results obtained from ques¬ 
tionnaires and pre- and post-testing scores. 

Reliable tests are administered in a consis¬ 
tent manner to all test-takers, and the devel¬ 
opers must eliminate any conditions that 
might make the testing experience different 
from one student to the next. This includes 
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making sure that the testing environment 
is identical for all students, that all raters 
administer and score the tests in a standard¬ 
ized manner, and that all students have a clear 
picture of what is expected of them. These 
conditions become possible by publishing test 
development and administration guidelines 
for teachers and test content information for 
the students. 

Valid and reliable tests are likely to pro¬ 
duce positive washback. For example, when 
a test is linked to what students are learning 
in class, they will experience testing as an 
extension of classroom work. In contrast, if 
the test is not specifically related to classroom 
instruction, the experience will be stressful 
and will cause students to attempt to memo¬ 
rize a large number of language items, which 
leads to short-term learning. In addition, 
when students know what to expect and the 
grading criteria are clear, testing will be more 
informative and result in positive washback. 
For example, if a student taking a paper-and- 
pencil test graded on a 0 to 10 scale receives 
a score of 7.7, he or she might ask: “What 
does this tell me?” The student does not know 
specifically what a good performance means 
because the criteria for grading has not been 
made clear and there is a disconnect between 
teaching and assessment. On the other hand, 
if a student writes a composition about sum¬ 
mer vacation and notes: “I think I deserve a 
good grade on this essay because it has a good 
title, an introduction and a conclusion, and 
its not boring,” then it is clear the student 
knows the required elements of good writ¬ 
ing and how to achieve a good score. The 
second student assessment demonstrates the 
inseparable nature of teaching, learning, and 
assessment (Raupp 2003). 

Stages to implement performance-based 
assessment 

As described below, the implementation of 
the new performance-based assessment plan 
at the CLC took place in three stages over a 
period of one year. (See Table 2 for a sum¬ 
mary of these stages.) 

Stage 1 activities 

1. Select teachers who represent Eng¬ 
lish course beginner, intermediate, and 
advanced levels (based on individual 
competencies and length of experience 


with different levels) to participate in 
test development. 

2. Define the hours of instruction 
required for each level: 200 for Begin¬ 
ner, 200 for Intermediate, and 100 for 
Advanced. 

3. Develop a performance objectives con¬ 
tinuum to guide teachers and describe 
what is expected from students at 
the end of each cycle in (a) Oral Pro¬ 
duction; (b) Reading Comprehension; 

(c) Listening Comprehension; and 

(d) Written Production. 

4. Discuss, revise, and finalize the perfor¬ 
mance objectives continuum. 

3. Construct a new four-point grading 
system (A, B, C, and D) to measure 
student performance. 

6. Develop rating grids with level descrip¬ 
tors based on the four-point grading 
system. To reduce subjective interpre¬ 
tations, this requires carefully worded 
descriptors as well as the repeated 
training and monitoring of raters to 
make sure they assign consistent scores 
and achieve adequate levels of interrater 
reliability (McNamara 1996). (See the 
Appendix for a sample rating grid.) 

7. Establish two periods for the assess¬ 
ment of four skills twice during the 
course: one at the end of the first 8 
weeks and the other at the end of the 
course, or at 16 weeks. 

8. Establish a pass/fail cutoff score for the 
Beginner, Intermediate, and Advanced 
level students. 

Stage 2 activities 

1. Identify and select appropriate testing 
instruments to measure the desired 
performance. 

2. Create an instrument bank, or a col¬ 
lection of testing items and tasks that 
meet the pre-established criteria of 
validity and reliability. 

3. Begin development of an Assessment 
Guide with thorough information for 
teachers about the new test develop¬ 
ment and administration procedures. 

4. Plan to identify signs of failing as 
early in the course as possible and 
provide remedial class interventions 
for students who need extra assistance. 
(Because the CLC had teacher train- 
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ees available, it was feasible to offer 
remedial classes throughout the year 
to students whose performance needed 
improvement.) 

Stage 3 activities 

1. Revise and produce final drafts of sup¬ 
port materials, including the Assessment 
Guide and a letter explaining the testing 
changes to students and parents. 

2. Conduct training sessions with the 
English teaching staff. 


3. Pilot test instruments on pre-selected 
groups of Beginner, Intermediate, and 
Advanced students. 

4. Analyze qualitative and quantitative 
data, identify problems, and recom¬ 
mend solutions. 

3. Publish results and implement changes 
to improve the quality of the instruc¬ 
tional program. 

6. Extend training sessions to the teachers 
of the other languages. 


Table 2: Stages in Implementing Performance-Based Assessment at the Language Center 


STAGES 

ACTIVITIES 

AGENTS OF CHANGE 

1. JULY 2000 

■ Determination of the instruction cycles 

■ Development of a performance 
objectives continuum 

■ Construction of a new grading system (A, 

B, C, and D) 

■ Development of rating grids with level 
descriptors for each of the four 
language skills 

■ Establishment of a pass/fail cutoff score 

■ A specialist in 
program evaluation 
and language 
assessment 

■ 5 permanent staff 
who are teachers of 
English 

■ 4 teacher trainees 

■ 2 pedagogic 
coordinators 

2. JANUARY 2001 

■ Creation of an instrument bank of valid 
and reliable items and tasks 

■ Identification and selection of appropriate 
instruments 

■ Production of a draft Assessment Guide 

■ Creation of a plan for remedial classes 

3. JULY 2001 
onwards 

■ Revision and final draft of Assessment 
Guide and other support materials 

■ Training sessions for teaching staff 

■ Piloting of instruments on pre-selected 
groups 

■ Analysis of data, identification of 
problems, and recommendations 

■ Publication of results and implementation 
of changes 

■ Introduction of training session to teacher 
of other languages 
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Overview of results 

Six years have passed since the CLC first 
began the transition from paper-and-pencil 
tests to performance-based assessment; during 
that time, pilot testing helped identify where 
to revise and improve the program. Pilot test¬ 
ing “instruments before actually employing 
them in final data collection is paramount” 
(Weir and Roberts 1994, 138). After the 
piloting in Stage 3 on pre-selected groups, the 
Assessment Project participants collected and 
analyzed qualitative data (test-taker feedback, 
teachers’ reports on student progress, adminis¬ 
tration reports) and quantitative data (statisti¬ 
cal analysis of scores and interrater reliability, 
reliability of pre- and post-tests). As a result, 
we made the following adjustments: 

1. We identified and revised some tests or 
test components that were too easy or 
difficult. 

2. We revised the four-letter grading scale 
by replacing the letter grades with num¬ 
bers to allow for averaging of scores 
for pass/fail decision-making and to 
achieve more reliable overall results. 

3. Instead of assessing students twice (at 
the midpoint and at the end), we now 
assess student performance at three 
intervals during the 16-week course. 

Benefits of the new assessment process 

As one of the participants and informal 
evaluators of the new assessment process, I 
observed the following beneficial results: 


• The assessment procedures are clearer 
for everyone since the desired level of 
student performance and scoring crite¬ 
ria are clearly established. 

• The mismatch between testing and 
teaching is greatly reduced because 
teaching activities are geared to the per¬ 
formance objectives and assessment. 

• Teachers utilize fewer grammar driven 
activities and more real-world commu¬ 
nicative tasks. 

• The assessment instruments strong¬ 
ly correspond with the subject mat¬ 
ter being taught and how it is being 
taught, increasing the content validity 
of the tests. 

• The testing changes allow the teachers 
to document student progress system¬ 
atically through formative assessment 
(daily in the classroom) and summative 
assessment (at the end of each level). 

• The standardized administration, rat¬ 
ing, and grading of the tests have 
increased the reliability of the assess¬ 
ment process. 

• Teachers who participated in the pro¬ 
cess have a sense of “ownership” of the 
project. 

Table 3 summarizes some of the benefits 
that resulted from the transition from tradi¬ 
tional paper-and-pencil assessment to perfor¬ 
mance-based assessment. 


# 


Table 3: Transition from Traditional to Performance-Based Assessment 

(adapted from Bailey 1998,207) 


One-shot tests 


Continuous assessment 

Textbook based tests 


Classroom performance test 

Inauthentic tests 


More real-world assessment 

Decontextualized test task 


Contextualized test tasks 

No feedback provided to learners 


Feedback provided to learners in four skills 

Subjective correction and grading 


Standardized scoring criteria 

No test follow-up 


Remedial classes available 

Negative washback 


Positive feedback 
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Conclusion 

Teachers and students have reacted posi¬ 
tively to the new assessment procedures at 
the CLC, where testing has become a lever for 
instructional improvement. The EFL program 
now has a valid and reliable testing system to 
diagnose student strengths and weaknesses 
and identify staff development needs. Most 
importantly, the changes have not yielded a 
finished product because they are related to 
performance objectives and not to a specific 
textbook, which leaves room for an adapta¬ 
tion and further change if necessary. 
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Continued from page 9 

/%D^(10n](ofc Sample Dialogue _ 

Using Replacement Performance Role-Plays... • Maria Snarski 


Scene: 
Student A: 
Student B: 
Student A: 


Two students walking toward class and talking about the upcoming exam. 

Good Morning! 

Morning, are you ready for the exam? 

No, I didn’t really have a chance to study, but I have a little help in case I need it. 
(flashes a cheat sheet) 


Student B: You’re going to cheap. 

Student A: Only if I have to. I didn’t have time to study last night. 

(They walk into the classroom, and Student A takes a seat next to Student B.) 


Teacher: Good morning, class. As you know, there is an exam today. Please remove your 

books from your desks and just have your pencils ready. You will have 30 minutes 
for the exam. When you are finished, you may leave. 


Scene: Student A visibly needs to cheat and tries looking at Student B’s paper and looking 

at the cheat sheet, avoiding being caught by the teacher. 

Student A finishes first and accidentally drops the cheat sheet. It lands near Stu¬ 
dent A. Student A leaves. Later, the teacher sees the cheat sheet and believes it 
belongs to Student B. The teacher questions Student B about the paper. 
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Oral Performance Rating Grid 
with Level Descriptors _ 

A Paradigm Shift: From Paper-and-Pencil Tests... • Leni Puppin 


Purpose: To identify student skill level in important language components and 
to record language progress. 

Task: Given a familiar topic and five minutes to prepare, the student will make a 
coherent one- or two-minute oral presentation. 


LANGUAGE 

COMPONENT 

Level D 

Level C 

Level B 

Level A 

Fluency 

Hesitant, makes 
repeated long pauses 
searching for ways to 
express him/herself. 
Often forced into 
silence by language 
limitations. Discourse 
is disconnected. 

Speech is frequently 
disrupted by the 
student’s search for 
the correct manner of 
expression. Frequently 
has problems linking 
ideas together in a 
logical sequence. 

Speech is generally 
fluent with occasional 
lapses while student 
searches for the correct 
manner of expression. 
Can on occasion link 
ideas together in a 
logical sequence. 

Speech is fluent and 
speech is rarely 
hesitant. Ideas are 
linked in a logical 
sequence. 

Vocabulary 

Misuse of words 
and very limited 
vocabulary make 
comprehension quite 
difficult. Resorts to LI 
to fill in vocabulary 
gaps. 

Frequently uses 
the wrong words. 
Conversation is 
somewhat limited 
because of insufficient 
vocabulary. Words are 
often repeated. 

In general, uses 
appropriate terms and 
words. Occasionally 
must rephrase ideas 
because of vocabulary 
limitation. 

Choice of words 
indicates a broad 
knowledge of 
vocabulary. Uses 
appropriate terms and 
words to express ideas. 

Pronunciation 

Very hard to 
understand because 
of pronunciation 
problems. 

Consistently needs to 
repeat words or 
sentences to be 
understood. Rarely 
uses appropriate 
intonation. 

Makes him/herself 
understood, though 
pronunciation 
problems necessitate 
concentration on the 
part of the listener 
and occasionally lead 
to misunderstandings. 
Frequently uses 
inappropriate 
intonation. 

Intelligible most of 
the time, though a 
definite foreign 
accent is noticed in 
his/her speech. 
Occasionally uses 
inappropriate 
intonation. 

Always intelligible, 
although a foreign 
accent that does 
not impede 
communication is 
noticeable in his/her 
speech. Errors in 
pronunciation are rare. 
Almost always uses 
appropriate 
intonation. 

Grammar 

Grammar, word order, 
and verb tense errors 
make comprehension 
difficult. Restricts 
him/herself to the 
simplest grammatical 
structures or leaves 
sentences unfinished. 
Uses isolated words to 
express ideas. 

Makes grammar, 
word order, and verb 
tense errors, which 
frequently obscure 
meaning and impede 
communication. 
Restricts him/herself 
to simple grammatical 
structures. 

Makes occasional 
grammar, word order, 
and verb tense errors, 
which do not always 
obscure meaning. 

Rarely makes 
grammar, word order, 
and verb tense errors 
that obscure meaning. 
Shows some degree of 
sophistication in the 
sequencing of tenses. 
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